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Abstract 

Murthy and Sethi (Sankhya Ser B 27, 201-210 (1965)) gave a sharp upper bound 
on the variance of a real random variable in terms of the range of values of that 
variable. We generalise this bound to the complex case and, more importantly, to 
the matrix case. In doing so, we make contact with several geometrical and matrix 
analytical concepts, such as the numerical range, and introduce the new concept of 
radius of a matrix. 

We also give a new and simplified proof for a sharp upper bound on the Probenius 
norm of commutators recently proven by Bottcher and Wenzel (Lin. Alg. Appl. 429 
(2008) 1864-1885) and point out that at the heart of this proof lies exactly the 
matrix version of the variance we have introduced. As an immediate application of 
our variance bounds we obtain stronger versions of Bottcher and Wenzel's upper 
bound. 
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1 Variance bounds for a real random variable 



The variance Var(X) of a random variable X that can assume the real values 
Xi and does so with probabilities pi is defined as 

Var(X) = J2Pi x i ~ C52Pi x i) 2 = J2Pi( x i ~ J2Pj x j) 2 - i 1 ) 

i i i j 
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It is of interest in mathematical statistics to have upper bounds on this vari- 
ance. A simple upper bound is given by 

Var(X) < ]Tx 2 /2, (2) 



which follows directly from a much sharper variance bound, due to Murthy 
and Sethi [IT] . 

Lemma 1 (Murthy-Sethi) Let X be a real random variable satisfying m < 
X < M. Then Var(X) < (M — m) 2 /4. 

Since (Af - m) 2 /4 = (m 2 + M 2 ) /2 - (m + M) 2 /4 < (m 2 + M 2 ) /2 < Ei /2, 
this bound immediately implies the bound (J2]). 

Proof. The argument, adapted from Muilwijk [10J, goes as follows. Some ele- 
mentary algebra will convince the reader of the following equality: 

Var(X) = J2M x i - m )(^ -M) + (ji- m)(M - /i), 

i 

where /x = J2iPi x i- Because X{ — m > and X{ — M < 0, the first term is 
non-positive (while the second is non- negative) . Hence (// — m)(M — //) is an 
upper bound on Var(X). By the arithmetic-geometric inequality, 

yJ{(j,-m){M - (i) < ((/i — m) + (M — fi))/2 — (M — m)/2, 
and the bound follows. 

The inequality is sharp as equality is achieved for a distribution where X is 
either m or M with probability 1/2. □ 

In this paper we will derive various generalisations of the Murthy-Sethi (MS) 
bound, and will highlight its geometric nature. The first generalisation con- 
cerns complex- valued random variables (section EJ), and this will carry over 
in a straightforward way to a matrix generalisation of variance, in the special 
case that the matrix is normal (section [6]) . Then, in section [3, we consider our 
main objective of a generalisation of variance that includes non-normal ma- 
trices. Along the way we relate these variance bounds to the concept of radius 
of a set of points, and to the new concept we introduce here of Cartesian ra- 
dius of a matrix (not to be confused with spectral radius, nor with numerical 
radius) . 

Before we embark on these generalisations, however, we first describe the seem- 
ingly unrelated problem of finding sharp bounds on certain norms of a com- 
mutator [X, Y] in terms of the norms of X and Y (section [3]). In section |4] we 
give a new proof of a known result and show that at the heart of it lies the 
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concept of variance of a matrix. The variance bounds we will obtain in this 
paper can therefore be applied to commutators straight away and allow us to 
derive new bounds on norms of commutators. 



2 Notations 

In this paper we are concerned with several kinds of matrix norms. First of 
all, as the most general class we'll consider the unitarily invariant (UI) norms, 
which we denote using the symbol |||.|||. As is well-known, any UI norm of a 
matrix X can be expressed in terms of the singular values of X, denoted <Ji(X). 
As is customary, we assume that singular values are sorted in non-decreasing 
order. For an n x m matrix X, <Ji{X) > cr 2 (X) > . . . > o>/(X) > 0, with 
N = min(n, m). 

Special classes of UI norms are the Schatten p-norms, the Ky Fan fc-norms, 
and the Ky Fan (p, /c)-norms. The Schatten p-norms are the non- commutative 
analogues of the l v norms and are defined, for any p > 1, as 

\\X\\ p := (Tr \X\ p ) 1/p , 

where \X\ denotes the (left)-modulus of X, 

\X\ := (X*X) l/2 . 

In terms of singular values, \\X\\ P = (Y^Li &i{X) p ) l l p . For p = 2, we retrieve 
the Frobenius norm, also called Hilbert-Schmidt norm, 



EE W- 

i=l 3=1 

The Ky Fan fc-norms are the sums of the k largest singular values, 

k 

\\ x \\(k) = E^PO- 

1=1 

Intermediate between these norms are the Ky Fan (p, /c)-norms [6] , which are 
defined as 

\\x\\ ( k), P = d:a t (xr) i/p . 

i=l 

We will use several special matrices repeatedly: the nxn identity matrix ll n , 
or just 11 if there is no risk of confusion; the standard matrix basis element e % \ 
which has a 1 in position and all zeroes elsewhere; the standard vector 
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basis element e l ; and the Pauli matrices known from quantum physics, 



a, 



1 

1 




, cr y = I I , er* 




We denote the diagonal matrix with diagonal elements (xi,x%, . . . ,x n ) by 
Diag(xi, x 2 , ■ ■ ■ , x n ). Finally, we need the matrix Diag(l, 1, 0, . . . , 0) so often 
that we assign it the symbol F. 

We also use a concept from quantum mechanics called the density matrix. 
Disregarding the physical interpretations, we call a matrix a density matrix iff 
it is positive semi definite and has trace 1. This implies that both the vector 
of eigenvalues and the vector of diagonal elements (in any orthonormal basis) 
are formally discrete probability distributions, being composed of non-negative 
numbers and summing to 1. We denote density matrices by lower case greek 
letters p and a. The set of d x d density matrices is convex and its extremal 
points are the rank 1 matrices ipip*, where ip can be any normalised vector in 



3 Commutator bounds 



The commutator of two matrices (or operators) X and Y is defined as [X, Y] = 
XY — YX and plays an important role in many branches of mathemat- 
ics, mathematical physics, quantum physics, and quantum chemistry. In [3], 
Bottcher and Wenzel studied the commutator from the following mathemat- 
ical viewpoint: fixing the Frobenius norm of X and Y, they asked "How big 
can the Frobenius norm of the commutator be and how big is it typically?" 

By a trivial application of the triangle inequality and Holder's inequality one 
finds that 1 1 [X, Y] 1 1 2 < 2| \X\ I2I \Y \\%. However, it appears that 2 is not the best 
constant. It is straightforward to show in the case where X and Y are normal 
that the best constant is actually \/2. Numerical experiments led Bottcher 
and Wenzel to conjecture that a/2 is also the best constant when X and Y 
are not normal. Their conjecture can be stated thus: 

Theorem 1 (Bottcher and Wenzel) For general complex matrices X and 
Y , and for the Frobenius norm WW2, 

||[X,y]|| 2 < V2||X|| 2 ||Y|| 2 . (3) 



The inequality is sharp. 
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We state it here as a theorem because the conjecture has been proved since. 

Equality is obtained for X and Y two anti-commuting Pauli matrices; say 
X = a x and Y = a z , then [X, Y] = -2ia y . This gives ||[X,F]|| 2 = 2^2 and 
||X|| 2 = ||y|| 2 = y/2. 

As already mentioned, the case of normal matrices is rather easy. For non- 
normal real 2x2 matrices the proof is also easy, and Laszlo proved the 3x3 
case [8] . The first proof for the real nx n case was found by Seak- Weng Vong 
and Xiao-Qing Jin [13] and independently by Zhiqin Lu [9]. Finally, Bottcher 
and Wenzel found a simpler proof p[j that also includes the complex n x n 
case. 

The empetus behind the present paper was the desire to find an even shorter 
and more conceptual proof, that would also allow natural generalisations to 
prove extensions of the theorem. One can indeed ask for the sharpest constant 
when [X, Y] , X and Y are compared in terms of different Schatten norms. 
That is: 

Problem 1 Let X and Y be general square matrices, and p,q,r > 1, such 
that 1/p < 1/q + l/r holds. What is the smallest value of c such that 

||[X,F]|| p <c||X|y|F|| r 

holds? We will denote this smallest c by c v ^ r . 

The restriction 1/p < 1/q + 1/r is necessary. When it is not satisfied, c is 
dimension dependent, just as in the case of Holder's inequality. To see this, 
take two fixed non-zero X and Y for which Y]\\ p < c\\X\\ q \\Y\\ r holds, 
with some predetermined finite value of c. Then replace X and Y by X <g> lip 
and Y ® 11 d, with large value of D. The left-hand side of the inequality is 
thus multiplied by D 1 ^, while the right-hand side is multiplied by _D 1 / £ '+ 1 / r . If 
1/p > 1/q + 1/r, the left-hand side grows faster with D than the right-hand 
side, and for some large enough D, the inequality will be violated for any 
initial choice of c. Thus, if 1/p > 1/q + l/r, there is no finite c for which the 
inequality holds universally, and henceforth we only consider the case when 
1/p < 1/q + 1/r. 

Numerical experiments have led us to conjecture: 
Conjecture 1 For the restricted case p = q, 

r _ r _ omax(l/p,l-l/p,l-l/r) ( a\ 

Ly p,q,r ^p,p,r ^ ■ V / 

By taking special examples of X and Y we can calculate lower bounds on the 
constant c p ^ r . This allows us to check that the conjectured bounds would be 
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sharp. 



(1) The two anti-commuting Pauli matrices X = a x and Y = o~ z give 
||[X,y]|| p = 2 1+1 / p and = 2 1 /" and ||Y|| r = 2^, hence 

c > 2 l+l/p-l/g-l/r_ 

(2) The choices X = e 12 and Y = e 21 give [X,Y] = a ZJ hence || [X,Y] \\ p = 
2 l ' v and \\X\\ q = \\Y\\ r = 1, thus 

c > 2 1/p . 

(3) The two anti-commuting matrices 
/ 



X 



y/2 -2-y/2 
2-V2 -V2 



/4, Y 




Since X is rank 1 and Y is unitary, both XY and [X, Y] are rank 1. The 
singular values of X are (1, 0), those of Y are (1, 1), and those of [X, Y] 
are (2,0). Thus || [X,Y] \\ p = 2, \\X\\ q = 1 and ||F|| r = 2 1 / r . Of course, 
X and Y can be swapped. This gives 

c>2 1 " 1/r , c>2 1 - 1/q . 

Note that in [1] the special case p = q = r of this conjecture has already ap- 
peared (eq. (28) in [1]), which subsequently has been proven by Wenzel, using 
complex interpolation (Riesz-Thorin) methods [Hj. The methods investigated 
in the present paper will allow us to establish the conjecture for p = q = 2 
and general r. 

Furthermore, the special case p = q = r = oo, has been proven a long time ago 
by Stampfli [12] in the broader setting of operator algebras. The goal there 
was to study the operator norm of the operator Dy '■ X i— > [X, Y] in terms of 
the operator norm of Y. In that sense, our conjecture relates the norm of Dy 
acting on Schatten class L p to the Schatten r norm of Y. 

In the regime where p, q and r satisfy 1/p — 1/q + 1/r, the best constant is 
trivial and equal to 2. In fact, we have the more general theorem for all UI 
norms: 

Theorem 2 For general complex matrices X and Y , and for any UI norm 
HI [X,Y] HI < 2||| |X| s ||| 1/s HI \Y\* HI 17 *. (5) 



Proof. This just follows from a combination of the triangle inequality and the 
fact that |||Xy||| = |||YX||| for any UI norm, with Holder's inequality for UI 
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norms: 



\\\[X,Y] HI = |||XF-FX||| < \\\XY\\\ + \\\YX\\\ = 2|||XY|||, 

and 

|||xy||| < in |x| s ||| 1/s in |y|* in 17 *, 

for s and t satisfying s > 1 and l/s + l/t = l(|lj, Corollary IV. 2. 6). □ 

In spite of the triviality of the proof, the factor 2 is the best constant. Indeed, 
equality is obtained for X and Y two anti-commuting Pauli matrices. 

Applied to Schatten p-norms, this gives the special case mentioned above 

||[X,F]|| p <2||X|y|F|| r , (6) 

for 1/p = 1/q + 1/r. 

In the following section we present our proof of Theorem [TJ and a certain 
expression obtained halfway through it will allow us to make contact with 
our main object of interest, namely the variance bounds mentioned at the 
beginning. 



4 A New, Shorter Proof of Theorem Q] 

One easily checks the following: 

| \XY -YX\\l = Tr[XYY*X* - XYX*Y* - YXY*X* + YXX*Y*} 
= Ti[X*XYY* - XYX*Y* - YXY*X* + XX*Y*Y] 
| \X*Y + YX* \\ 2 2 = Tr [YX*Y*X + YX*XY* + X*YXY* + X*YY*X] 
= Tr[X*XY*Y + XYX*Y* + YXY*X* + XX*YY*\. 

Taking the sum yields 

\\XY -YX\\l + \\X*Y + YX*\\l 

= Ti[X* XYY* + XX*Y*Y + X*XY*Y + XX*YY*] 

= Ti(X* X + XX*)(Y*Y + YY*). (7) 

By the Cauchy-Schwarz inequality, 

| Tr [Y(X*X + XX*)] | = | Tr[(FX* + X*F)X] | 

< \\YX* +X*Y\\ 2 ||X|| 2 . (8) 
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Combining ([7]) and (JHJ) then gives 



| \XY -YX\\l< Tr{X*X + XX*)(Y*Y + YY*) 
-| Ti[Y(X*X + XX*)]|7||X|| 2 . 

Introducing the matrix p = (X*X + XX*)/(2\\X\\l), this can be expressed as 

\\XY -YX\\l < 4\\X\\l(Tr[p(Y*Y + YY*)/2] - |Tr[pF]| 2 ) . (9) 



Note that p is positive semi-definite and has trace 1 and is formally a density 
matrix. The quantity Tr[p(Y*Y + YY*)/2] — | Tr[pF]| 2 appearing here is rem- 
iniscent of the variance of a random variable, with p taking over the role of a 
probability distribution. 

To make the connection even more obvious, consider now the Cartesian de- 
composition Y = A + iB, where A and B are Hermitian. One checks that 
(Y*Y + YY*) /2 = A 2 + B 2 . Therefore, 

Tr[p{Y*Y + YY*)/2] - | Tr[pY]\ 2 = Tr p(A 2 + B 2 ) - (Tr pA) 2 - (Tr pB) 2 , 

which is a sum of terms in A and in B separately. We now need to show that 
the right-hand side is bounded above by ||F|||/2 = (\\A\\\ + 1 1 1 1 /2. 

This would follow if Tr pA 2 - (Tr pA) 2 < \\A\\l/2 for all Hermitian A. We 
can prove this by passing to a basis in which A is diagonal, so let's put A = 
Diag(di, . . . ,a d ) and let us denote the diagonal elements of p in that basis 
by Pi. As the Pi are non-negative and add up to 1, they form a probability 
distribution. The quantity Tr pA 2 — | TrpA| 2 then becomes 

i i 

This is the variance Var(A) of a random variable A that can assume the values 
cii and does so with probabilities pi. Applying the variance bound 

Var(A)<£a 2 /2=|l^/2 

i 

then proves the required statement. □ 

In the remainder of the paper we study the quantity 

Tr[p(Y*Y + YY*)/2] - | Tr[pF]| 2 , 

which can be seen as a generalisation of the variance of a random variable to 
the matrix (quantum) case. We derive sharp bounds on this generalised vari- 
ance, which directly lead to sharper bounds on the 2-norm of a commutator, 
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and which allow us to prove a special case of our conjecture about commutator 
norms. 



5 Variance of a complex random variable 

First of all, we formally define the variance of a complex random variable. 
To do so, we replace squares in (JTJ by modulus square, whether this makes 
statistical sense or not. For X{ e C: 

Var(X) = ^Pi\xi\ 2 - | ^PiXi\ 2 = ^Pi\xi - ^PjXj\ 2 . (10) 

i i i j 



In statistical terms, this corresponds to the trace of the covariance matrix 
when considering real and imaginary part of X as two random variables. 

Our first result, proven below, is a straight generalisation of the MS bound to 
the complex case. 

Theorem 3 For a random variable X assuming complex values Xi, the largest 
possible variance obeys 

max^] Pi\xi — ^2,PjXj\ 2 = minmax \xi — y\ 2 . (11) 

P . . y&C i 



The right-hand side can be interpreted in the context of Euclidean planar 
geometry applied to the complex plane (with the modulus acting as Euclidean 
norm) . 

Definition 1 The radius of a set of points X = {x{} in the Euclidean plane, 
denoted r(X) , is the radius of the smallest circle circumscribing X . The center 
of X is the center of that circle. 

Theorem [3] thus says that the variance of X taking values in X is bounded 
above by the square of r(X). 

Some obvious properties of the radius of a set are that it is invariant under 
global translations, rotations and reflections. It is also homogeneous of degree 
1. 
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5. 1 Proof of Theorem [3] 



A simple observation will be important for the proof. Let C be the smallest 
circumscribing circle of X and let c be its center. Let X' be the subset of points 
of X that lie on C. There must at least be two such points, for if it contained 
only 1 point a smaller circle could be found by moving the center towards 
that point. Then, Va; G X', \x — c\ — r(X), and for all other x, \x — c\ < r(X) 
strictly. This means that for a new point d close enough to c, maxj \xi — d\ is 
obtained for x^ on C. 

Lemma 2 The center of a set X endowed with a Euclidean metric is con- 
tained in the convex hull of the points of X' . 

Proof. Suppose, to the contrary, that c lies outside the convex hull of X'. By 
Minkowski's separating hyperplane theorem there must then be a hyperplane 
P such that c is strictly on one side, while all points of X' are strictly on the 
other side of P. Let d be the orthogonal projection of c on P and let x be any 
point in X'. From the geometry follows that the triangle c, d, x has an obtuse 
angle at d (see Figured]). By the cosine rule one then sees that every point 
x G X' is strictly closer to any point on the open line segment ]cd[ than to c, 
violating the assumption that c is the center of X. □ 




Fig. 1. 



Proof of Theorem^ We start with the expression J2iPi\ x i ~ J2j 1j x j\ 2 i where 
p and q are two probability distributions. As any average of real quantities is 
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bounded above by the maximum, we can replace the outer average and get 
^Pi\ x i ~ 1J x i\ 2 - max \ x i ~ H 1j x j\ 2 - 

i j j 

This is true for any q. Hence, the inequality remains if both sides are minimised 
over q. 

The minimisation of the right-hand side, minqHiaXj jx, — J2j Q.j x j\ 2 -> is almost 
the right-hand side of the Theorem, but with the minimisation over any com- 
plex value y replaced by a minimisation over the convex hull of the set of Xi. 
From lemma [21 however, we see that the optimal y will be within that convex 
hull, so that both minimisations must yield the same value. 

We will now show that the left-hand side is minimal for q equal to p, in which 
case the value is equal to the variance. Let \ip = J2iPi x i an d Hq = J2iQi x i- 
Obviously, \jiQ — jip\ 2 > 0. Thus \hq\ 2 — 2?R,JIq^p > |/ip| 2 — 2ffijlpixp. From 
this it follows immediately that J2iPi\ x i ~ ^q\ 2 — Y,iPi\ x i ~ ^pI 2 , f° r & U Q- 

To show that equality holds, take pj such that YJj=\Pj x j = y*, where only 
points Xj in X' contribute. This is possible because, by lemma [2], y* is in the 
convex hull of those points. □ 

5.2 Relation between radius and vector norms 

The radius is not a norm, because it is not convex. Nevertheless, our next two 
results draw the connection between the radius and permutation invariant 
(PI) vector norms. First we show that the radius is bounded above by one 
half the value of a specific PI vector norm and then we derive from that how 
it relates to all other PI vector norms, giving best constants for each. We 
introduce some notation, borrowed from the theory of majorisation: let 
be the fc-th largest value among the moduli of X — We freely consider X 

either as a set of d points in C or as a vector in C d . We also use the shorthand 
X - z = {xt - z}i. 

The central statement is that the maximum in the definition of r(X) can be 
replaced by means of the largest and the second largest value. 

Theorem 4 For any set of complex values X = {xi}f =1 , and any p > 1, 

, s f(\x-z\^)p + (\x-z\^y\ 1/p , s 

By putting z = 0, we then immediately get: 
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Corollary 1 For any set of complex values X = {x{\f =1 , and any p > 1, 
r(X)< (((\X\ ul ) p + (\X\My)/2) X/p . (13) 

The relation to all other PI norms then follows from: 
Theorem 5 For all permutation invariant norms \ \.\\ on C d , 

(IXI 1 * 1 + \X\ l > 2 )/2 < H| < maxdl^lU, ||#||i/2) . (14) 

Because the last theorem is easily generalised to matrices in terms of the Ky 
Fan /c-norm ||X||(jfe), we will prove it for matrices straight away. 

Theorem 6 For all unitarily invariant norms |||.||| on M(C d ), 

||X|| (2) /2 < jUS < max(||X|U ||X||i/2) . (15) 

Theorem [5] follows by setting X = Diag(xj). 

Proof. We start with the lower bound. Note first that 

ll^ll(2)/2 = ||X|| (2) /||F||(2). 

We wish to prove that of all UI norms, the 2nd Ky Fan norm minimises the 
ratio ||X||/||F||. 

Every unitarily invariant norm |||.||| can be defined as ([6], Theorem 3.5.5) 
|||X||| = max{^o4(Xi(X) : a G iV|||.|||}, 

i 

where Nu\u\ is a compact subset of specific to that norm and o~i(X) are the 
singular values of X. Minimising over all UI norms thus amounts to minimis- 
ing over all compact sets N. In particular, minimising the ratio 1 1 \X\ \ \/\ \ \F\ \ \ 
amounts to minimising over all compact sets N whose associated norm obeys 
the constraint |||-F||| = 1, that is max ag Ara|; + a\ = 1. Thus 

min |||X|||/|||F||| = minmaxiy^ a\aAX) : a\ + ai < 1} 

i 

= min maxiy^ a\aAX) : a\ + ai = 1} 

% 

= min{aj;<Ti(X) + o^o^X) : a[ + a\ = 1} 
= (a 1 (X) + a 2 (X))/2, 
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which proves the lower bound. 

For the upper bound, we similarly have 



max 



|X|||/|||F||| = maxmaxiy^ a\aAX) : a\ + ok < lj. 



Thus elements a% and beyond must be as large as possible, which means they 
should be equal to a^. The maximisation then reduces to a maximisation over 
a[ =: a, where 1/2 < a < 1, 



d 

max | " " " ' ' ' v 

k=2 

d 

max (2a - lWX) + (1 - a ) Y 0"kPO 



V 



k=l 



l/2<a<l 

= max (2a-l)||X|| 00 +(l-a)||-X'||i. 

l/2<a<l 

The maximum is attained in one of the extreme points, a = 1/2 or a = 1, 
hence 

max|||X|||/|||F||| = max(| \X\ N/2, 1 \X\ L). 
Ill-Ill 

□ 

Corollary 2 For all p > po, 

2 -i/po|| | X | P o ||J/ ) ^<S!< max (2- 1 M>|| X || po) || X || oo ^ (16) 

If Hp 



Note that the norm in the left hand side is the Ky Fan (p, fc)-norm ||X||(2) ;P0 . 
Proof. Apply theorem [6] to |X| P0 and note that || |X| P0 ||, = ||X||» ? . □ 

5. 3 Proof of Theorem [7] 

To prove Theorem HI we need a lemma. 

Lemma 3 Consider a polygon P. Let P' be the polygon whose vertices are 
the midpoints of the edges of P. Then the center of the smallest circle that 
circumscribes P is in P' . 

Proof. Consider first the simplest case that P is a triangle ABC. By a well- 
known and easily proven geometrical theorem, the center of a circumscribing 
sphere containing all three points of the triangle is equal to the intersection 
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D of the bisectors of the triangle's edges. By definition, these bisectors pass 
through the midpoints of the edges of P, which are the vertices of P' . By 
inspection one sees that if D lies in P = ABC, it must also lie in P'. The two 
possible cases are illustrated in Figure 1. 



A 




Fig. 2. 

If P is a general polygon, it can be subdivided into one or more non-overlapping 
triangles. According to Lemma [2l the center D of the smallest circle circum- 
scribing P is in P. Therefore, D is in one of those triangles; call it ABC. 
By the above argument, D is also in the triangle of midpoints of ABC. This 
triangle of midpoints is a subset of the polygon P' of midpoints. Hence D is 
in P' too. □ 

Proof of Theorem H We will show that the optimal z in the RHS of ([121) is 
equal to y*, the optimal y in r(X) = min y maxj \xi — y\. 

We relabel the points of X so that xi, X2, ■ ■ ■ , x m are the points in X', i.e. they 
are the points on the smallest circumscribing circle around X, which has center 
in y*. There must be at least two points in X'. Thus, \X — y*\^ x = \X — y*\^ 2 , 
so that the RHS of (IT2"|) is equal to its LHS in the point z = y*. 

We must show that the RHS is minimal in z = y*. We begin by pointing out 
that the RHS is a norm of X — z and hence a convex function of z (see, e.g. 
PQ, Example IV. 1.4). Thus, this function has a single local minimum, which 
automatically is the global minimum. To find out whether z = y* is indeed 
the global minimum, it suffices to check whether it is a local minimum. We'll 
do so by perturbing z by an infinitesimal amount: z = y* + tA. 

If t is small enough, the only contributions to the derivative of (\X — z\^ 1 ) p + 
(\X — z\^ ,2 ) p come from the derivatives of |xi — z\, \x2 — z\, . . . , \x m — z\. More 
precisely, only the two largest of these derivatives contribute. The derivative 
of \xi — z\ p w.r.t. t in t = is p\xi — y*\ p ~ l (a constant factor for X{ G X') 
times the derivative of \xi — z\ w.r.t. t in t = 0, which is — (A, Xj — y*). 
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To show that y* is a local minimum, we have to show that for any A the 
sum of the two largest derivatives is non- negative. This means that, for any 
A, there exist distinct % and j such that — (A,Xj — y*) — (A,Xj — y*) > 0, 
i.e. ((—A), (xj + Xj)/2 — y*) > 0. Now, the set of points (xj + Xj)/2 contains 
the midpoints of the edges of the polygon P with vertices X\, x^, ■ ■ ■ , x m . By 
Lemma[3]the polygon P' whose vertices are these midpoints contains the center 
y* of the circle. Therefore, for any direction A there will be some midpoint 
(xi + Xj)/2 such that ((—A), (a^ + Xj)/2 — y*) > 0. This shows that, indeed, 
z = y* is a local minimum. □ 

This proof relies heavily on planar geometry. It would be interesting to find 
an entirely algebraic proof. 

5.4 Main Result 

Combining all results obtained so far yields the main theorem of this section: 

Theorem 7 For a complex valued random variable X , taking values in the 
discrete set X = {xi}f =1 , and for any PI vector norm \ \.\\, 

^/Var(X) < r{X) < ||*|| (2) /2 < (17) 

where X has been interpreted as a vector in C d , and F = (1, 1, 0, . . . , 0). 

Remark. Many other generalisations are possible of the concepts introduced 
here. In the above we've considered complex valued X, with norm given by 
the complex modulus. This is isomorphic to vectors in M. 2 , endowed with the 
Euclidean 2-norm. We can more generally consider X whose values are in £ p , 
or even in the Schatten class L p . 



6 Quantum Variance of Normal Matrices 

In this section we consider variance bounds in the matrix setting, where proba- 
bility distributions are replaced by density matrices. This leads to the following 
definition of the variance of a normal matrix X: 

Definition 2 The quantum variance of a normal matrix X G M^(C) w.r.t. 
the density matrix p is given by 

Var(X) = Tr[p|X| 2 ] - | Tr[pX]| 2 = Tr[p|X - Tr[pX]ll d | 2 ]. (18) 
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Here, |.| stands for the matrix modulus defined by |X| = (X*X) 1//2 , and 11^ 
is the d x d identity matrix. 

Remark. In the mathematical physics literature a more general version of the 
variance can be found, based on unital completely positive maps $ [2], where 
the variance is operator-valued. Our definition here corresponds to the choice 
$(X) = Tr[pX], yielding a scalar- valued variance. 

This definition is a straightforward generalisation of the classical variance for 
complex scalar variables. Since X is normal, it can be diagonalised by a unitary 
conjugation. Inserting X = UAU* in the definition, with A = Diag(Ai, . . . , A^) 
complex, yields 

Var(X)=]>>|A,| 2 -|5>,A/, 

» 3 

where pi is the diagonal element (U* pU)u. Therefore, if p can be any density 
matrix, p = (pi, . . . ,Pd) can be any probability distribution. It follows that 
the variance bounds obtained for complex variables carry over wholesale to 
normal matrices, by applying them to the spectrum of the normal matrix. 

In particular, the radius r{X) of a normal matrix is the radius of its spectrum. 
Note, however, that the term spectral radius is already in use and denotes the 
radius of the smallest circumscribing circle with center at the origin. One can 
easily show that the spectral radius of a normal matrix is an upper bound on 
the radius of its spectrum. 

The main result of the last section becomes: 

Theorem 8 For a normal d x d matrix X and for any UI vector norm \\\.\\\, 
v /Var(X) < r{X) < ||X|| (2) /2 < 1 1 |X| 1 1/| | |F| 1 1, e.g. 2~ 1/p ||X|| p . (19) 

Using this theorem, Bottcher and Wenzel's theorem can already be strength- 
ened in the specific case of normal Y, by combining the statement obtained 
halfway through its proof with theorem [HJ For normal Y, and all p > 1, 

II [X,Y] || 2 < 2||X|| 2 r(Y) < ||X|| 2 ||F|| (2) < 2 1 -^\ \X\ | 2 | \Y\ \ p . (20) 



7 Quantum Variance of Non-Normal Matrices 

We will now investigate the general case, of quantum variance of a non-normal 
matrix. In this case the left modulus and right modulus of X, (X*X) l l 2 and 
(XX*) 1 / 2 , are no longer the same. Therefore, there are many possible distinct 
extensions of the expression Tr[p|X| 2 ]. One is Tr[pX*X], another is Tr[pXX*], 
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and we'll also consider the mean of the two, Tr[p(X*X + XX*)]/2, which 
featured prominently in our proof of Theorem [TJ 

For that reason we need a name for the expression ((X*X + XX*)/2) 1//2 , and 
we have chosen to call it the Cartesian modulus. One observes that in terms 
of the Cartesian decomposition of X, X = A + iB with A and B Hermitian, 
the Cartesian modulus reduces to the pleasing form {A 2 + B 2 ) 1 ! 2 . 

For convenience, we'll denote the three corresponding moduli by |.|*, each with 
a different subscript: 



|X| L :=(X*X) 1/2 , (21) 
\X\ R :=(XX*)V 2 , (22) 



\X\ c :=^(\X\ 2 L +\X\ 2 R )/2. (23) 

Note that ||| |X|^||| = |||X||| for any UI norm and the same holds for the right 
modulus. For the Cartesian modulus this is no longer true, but we do have the 
following inequalities for Schatten p-norms obtained by Bhatia and Kittaneh 
(PSJ, eqns (3.38) and (3.39)): for p > 2, 

|||^|c|| P <||X|| p <2 1 / 2 - 1 / p |||X| c || p , (24) 



while the reversed inequalities hold for 1 < p < 2. More fundamental is the 
following inequality for the Ky Fan (p, fc)-norms with p = 2 

I \X\c \\(k),2 < ||^||(fc),2) (25) 



for any k (which in [15] is phrased as a majorisation statement; see its eq. 
(3.31)). 

Each modulus builds a different variance, which we'll distinguish by the cor- 
responding subscript too. Thus 

Var*(X) = Tr[p|X - Tr[pX]ll d | 2 ] (26) 



where * stands for L, R or C. It is easily checked that in each case, the variance 
satisfies the relation 

Var*(X) = Tr[p|X| 2 ] - | Tr[pX]| 2 . (27) 



We next show how to generalise theorem [3] to the non- normal matrix case. 
In the proof we need the numerical range W(X) of a matrix X [6]: W(X) = 
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{ipX^* : ip e C d , II^H = 1}. By the Toeplitz-Hausdorff theorem, W(X) is a 
convex set. It can therefore be redefined in terms of density matrices as 

W(X) = {Tr[pX] : p > 0, Trp = 1}. (28) 

Henceforth, we use the shorthand max p or min p to denote maximisation and 
minimisation over all possible density matrices p. 

Theorem 9 For a non-normal n x n matrix X , 

Var*(X) < maxTr[p|X -Tv[pX]ll d \ 2 } = min|| \X-yll\l W^. (29) 

Furthermore, the maximisation over p can be restricted to density matrices of 
rank 1, of the form vbvb* , with if) a normalised vector in C N . 

Proof. The proof proceeds in a similar way as in the complex variable case. 
The bivariate function 

(p,a) i ► f(p,a) = Ti[p\X - Tr[aX]ll\l) 

satisfies the following properties: its domains are compact convex sets (being 
the set of all density matrices), the function is convex in a for all p, concave 
(linear, in fact) in p for all a, and continuous in both p and a. All conditions of 
Kakutani's minimax theorem [7] are therefore fulfilled, hence in the minimax 
expression min CT max p f(p, a) the minimisation over a and maximisation over 
p can be freely interchanged. 

One easily verifies that 

Tr[p|X - Tr[aX]ll|^] - Tr[p|X - Tr[pX}ll\ 2 R ] 

= \Tr[aX] -Tr[pX]| 2 

>0, 

so that the minimum of Tr[p|X — Tr[cxX]ll|^] over a is obtained for o = p. The 
same is obviously true for the left modulus, and it also holds for the Cartesian 
modulus since = (|.|| + \-\ 2 R )/2. 

Therefore, we get the following chain of equalities: 

max Tr [p | X - Tr [pX] 1 1 1 2 ] = max mm Tr [p | X - Tr [aX] 

= mmmaxTr[p|X - Tr[aX]ll| 2 ] (*) 
= min| | iX-Tr^Xlll^Hoo 
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: min II IX — villi 

yew(x) in 

min 1 1 |X — yll 



oo 



2 1 1 

* 1 1 oo ' 



In the third line we used the Rayleigh-Ritz characterisation of the largest 
eigenvalue of a Hermitian matrix. In the last line we could remove the con- 
straint y G W(X) because of the fact, proven in lemma H] below, that the 
optimal y in min^c || |X — yll\l ||oo is automatically in W(X). 

To prove the final statement of the theorem, we note that in (*) the max- 
imisation over p can be restricted to p that have rank 1. Furthermore, the 
minimisation over all density matrices a can also be done for a that have rank 
1. This is because the numerical range W(X) is a convex set, hence TroX 
and (<p,X<p) cover the same set. We can thus replace (*) by 

minmax(?/>, |X — (</>, X0)ll| 2 ?/>). 
A short calculation yields that this is equal to 



minmax^, |X|^> + \(<t>,X<f>) - (^,X^)| 2 - \(^,X^)\ 



and one sees that the minimum over <fr is obtained for <fi = ip, and is equal to 

max(^,|X|^)-|(^,X^)| 2 , 

■if, 

which proves that the maximum ^-variance of X over all p is indeed obtained 
for p of rank 1. □ 



7.1 Radius and Cartesian radius 
Because of theorem [9l we define: 

Definition 3 The * -radius of a non-normal matrix X is 

r*(X) = min || |X — yllL Hex,. (30) 

where * may stand for L, R and C, corresponding to the use of the respective 
*-modulus. By the theorem we've just proven, we also have the dual definition 

r*(X) = max(Tr[p|X - TrlpX]!!,] 2 ]) 1 / 2 . (31) 



It is easy to see that left and right moduli yield the same value; moreover, 
the Cartesian modulus yields a radius that is bounded above by the left /right 
radius. 
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Theorem 10 For any matrix X , 

r c (X)<r L (X)=r R (X). 



Proof. The statement of equality of L and R radius follows from their definition 
and the fact that ||XX*|| = ||X*X|| for any UI norm. 

Let p be an optimal p in the dual expression (T5TT) for rc{X). In general, p is 
not optimal for nor Tr. Thus, 

r 2 c (X)=Tr\p\X - Tr\pX}ll d \ 2 c ] 

= (Tr[p\X - Tr[pX]ll d ||] + Tr[p|X - Tr[pX]ll d |^])/2 

<(rl(X)+rl(X))/2 

= rl(X) = rl(X). 

□ 

By this result, we no longer need to distinguish between r^{X) and r R (X), 
and we'll denote it just by r(X) and call it the radius of X, while we call 

r c (X) the Cartesian radius. 

For the proof of Theorem we needed the matrix equivalent of lemma [2J This 
lemma already appeared in Stampfli's paper [12] but was proven in a different 
way and only for the left modulus. 

Lemma 4 For any matrix X , the value o/i/GC that achieves the minimum 
of || \X — yll\* || is contained in the numerical range W(X). 

Proof. We will prove this by contradiction. A point z G C is in the numerical 
range W(X) if and only if [6] 

V0 G R : K(e^) < A max (K(e^X))), 

where the real part of a matrix is defined as 31A = (A + A*)/2. 

Let y' be a complex number that is not in W(X). Thus there exists an angle 
such that 3£(e l 'V) > A max (9ft(e^X))), strictly, or 

A max (^(e^(X - y'll))) < 0. 

We will show that this y' cannot be optimal for mim, || \X — ||oo- 

Obviously, \X - yll\ 2 R = \e**X - e i(t> yll\ 2 R . Thus, defining Z = e i<j> X - e^y'll 
and setting y = y' + e~ l< ^e, we only need to prove that if X max (JftZ) < 0, then 
the minimum of A max (|Z — ell||.) is not achieved for e = 0. Since this is a 
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convex function of e, it suffices to consider values of e in an arbitrarily small 
neighbourhood of 0. 

Now put Z = — (A+iB), with A and B Hermitian. The condition A max (9?Z) < 
means that A should be strictly positive definite. Does A > imply that 
\(A + ell) + iB\ 2 R | |oo is not minimal in e = 0? It turns out that it suffices 
to consider real e only. A short calculation shows 

\(A+ ell) +iB\ 2 R = |A + ell| 2 - \A\ 2 + \A + iB\ 2 R 
= 2e(A + ell/2) + \A + iB\\. 

Since A > 0, we can choose an e < such that we still have A + ell/2 > 
strictly. Thus, there is an r] > (given by A mm (A) + e/2) such that A + ell/2 > 
77II. Then we have 2e{A + ell/2) < 2e??ll. Therefore, 

A max (|(A + ell) + iB\\) = A max (2e(A + ell/2) + \A + iB\\) 

<A 

max (leqll + \A + iB\ 2 R ) 
= 2eT 1 + \ m ^(\A + iB\ 2 R ) 

<\m^(\A + iB\ 2 R ). 

Thus, indeed, e = is not the minimum, as we set out to prove. 

One immediately verifies that the same reasoning holds for the left modulus 
and the Cartesian modulus too. □ 

7.2 Radius compared to numerical radius 

One can now ask how these different radii r(X) and rc{X) relate to the 
numerical range W(X). While we do not know the ultimate answer, we do 
know that none of the radii is the radius of the smallest circle circumscribing 
W(X). The Cartesian radius of X can be expressed as 

r%(X) = minmaxTr p\X — zll\ 2 r . 

The radius of W(X) is 

r w (X) :=r(W(X)) = mmmax | Trp(X - ^ll)|. 

Again, this is not to be confused with the numerical radius, w(X) := max p | Tr pX\. 
We therefore call r w the central numerical radius. We have: 

rw{X) = mmw(X — zll). 

y ' zee v ' 
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We now show that the central numerical radius is never bigger than the Carte- 
sian radius. This follows directly from: 

Theorem 11 For all matrices X , 

w(X) < || \X\ C ||. 

Proof. In terms of the Cartesian decomposition of X = A + iB, 

w(X) = max | Tr p(A + iB)\ = max ^ (Tr pA) 2 + (Tr pB) 2 , 

and 

|| \X\ C || = \\VA 2 + B 2 \\ = m&xTrpVA 2 + B 2 . 

The theorem would follow if, for all density matrices p and Hermitian A and 

B, 

\J (Tr pA) 2 + (Tr pB) 2 < Tr pVA 2 + B 2 . (32) 

Note first that | Tr pA\ < Tr p\A\, thus we only have to prove the inequality for 
positive A and B. Indeed, let A = A + — A^ be the Jordan decomposition of A, 
then | Tr pA\ = | Tr pA + — Tr pA_ | < | Tr p + 1 + | Tr pA_ | = Tr p + + Tr pA_ = 
Tr p\A\. 

By making the substitutions A = X 1 ! 2 and B = Y 1 ^ 2 , and taking squares on 
both sides, the inequality becomes 

(Tr pX 1 ' 2 ) 2 + (Tr pY 1 ' 2 ) 2 < (Tr p^XTY) 2 , (33) 

which expresses the concavity of the function X i— > (Tr pX 1 ^ 2 ) 2 on the set of 
positive matrices. It turns out that the function X i— > (Tr pX l l p ) p is concave 
for all p > 1. This can be proven by reducing the statement to Epstein's 
theorem [5], which states that the function X i— > r Ti(BX 1 l p B) p is concave for 
all p > 1. Taking, in particular, i? = iptp*, with -0 any normalised vector, 
shows that the function X \—> (ip*X 1 ^ p ^fj) p is concave, and that already proves 
fl33l) and fl32|) for p that have rank 1. The validity of fl32|) for general p then 
follows immediately by noting that any density matrix p can be written as a 
convex combination of rank 1 density matrices, the left-hand side of fl32|) is 
convex in p, and the right-hand side is linear in p. □ 

This easily gives: 

Corollary 3 For all matrices X, rw(X) < rc(X). 
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Proof. By the previous theorem, for all z e C, w(X — zll) < \\ \X — zll\c 
Minimising both sides over all z G C then gives rw(X) < rc(X). □ 



7.3 Radius compared to matrix norms 

Coming back to the definition of the various radii, as given by (13T)j) . one can 
again ask whether the infinity norm in (129]) has to be replaced by the second 
Ky-Fan norm, as was the case for normal matrices, to yield the best possible 
norm based bounds on the radii. The answer is negative. Instead, we have the 
following theorem that gives a bound on the L and R radius in terms of the 
infinity norm, and a bound on the Cartesian norm in terms of the Ky Fan 
||.||(2) i 2-norm. The reason for these different choices of norms is because these 
norms turn out to be the fundamental ones for each case, from which best 
bounds for every other norm can be derived. 

It can be expected that non-normal matrices might allow larger radii for fixed 
given norm. This is indeed the case. The best bound for the L and R radius 
is much weaker than in the normal case, to the point that its proof is actually 
trivial. The best bound for the Cartesian radius is stronger, and coincides 
with the bounds for the normal case for many norms. To see this, compare 
for example corollary H] below with theorem [HJ more precisely, the normal and 
non-normal bounds coincide for Schatten p-norms with p > 2. As could be 
expected, the proof is also harder. This can be seen as an indication that 
the Cartesian norm is the natural norm to use as far as radii of non-normal 
matrices are concerned. 

Theorem 12 For any n x n matrix X , 

r L (X)<\\X\\ {1) = 11X1100, 

while 

rc(X) <-L||X|| (2)j2 . 

Proof. The bound for follows immediately from the definition (|30|) by re- 
placing the optimal y by the suboptimal y = 0. 

For the bound, we will exploit the fact that there is a rank 1 density 
matrix p achieving optimality in Tq(X) = max p Tr p|X|^ — |TrpX| 2 . Let ip 
be the normalised vector in C n for which p = ipip*. We can now construct two 
ortho normal bases {-Uj}™ =1 and {fi}" =1 , with ui = v\ = if) and all other vectors 
unspecified for the time being, and express X in these bases as X = XijUiV* 
with Xij = (ui,Xvj). The Cartesian radius of X is then given by 
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r c {Xf = ~ (XX* + X*X)^) - | X^) | 2 

= (xx*K) + (vi, (x*xk)) - |( M i,x Vl )| 2 



i=i i=i 
-\(u 1 ,Xv 1 }\ 2 

n 

^ ' x jl\~ I - ■ ' ' J 1.1 



2 



i / n n 

1 / ^ , ,o ^ , 1 2 1 i |2 



EW + E 

z V=i i=i 
1 n 



We can use the remaining degrees of freedom in the two bases for choosing 
their vectors in such a way that all matrix elements X\j and Xj\ with j > 2 
are zero. Then we get the simple expression rc(X) 2 = ( |a:i2 1 2 + |x 2 i| 2 )/2. 

Obviously, an upper bound on (|xi 2 | 2 + |x 2 i| 2 )/2 is X^iG^ijl 2 + \ x 2j\ 2 )/2 = 
| \X' | |l/2, where X' is the 2xn matrix consisting of the upper 2 rows of X in the 
chosen bases. This can be written differently: let P be the 2 x n matrix given 
by P = e 1 ^ + e 2 u* 2 , then ||X'|| 2 = ||PX|| 2 . Hence, ||X'|| 2 = Tr(PXX*P*) = 
Tr(P*PXX*). Now note that P*P is a rank 2 partial isometry. Thus an upper 
bound on ||X'||| is given by the maximum of |Tr(y4XX*)| over all rank 2 
partial isometries. By Ky Fan's maximum principle, this maximum is equal to 
ai(XX*) + a 2 (XX*) = <xi(X) 2 + a 2 (X) 2 '. Therefore, (ai(X) 2 + a 2 (X) 2 )/2 is an 
upper bound on ||X'|| 2 /2 and also on rc(X) 2 , proving the second inequality 
of the theorem. □ 

We obtain corollary: 

Corollary 4 For every matrix X , 

r L (X) < \\X\\ p , p>l 

and 

rc(X) < 
These inequalities are sharp. 



2- 1 /p\\X\\ p , P >2 
2- 1 / 2 ||X|| p , 1 <p< 2. 



Proof. Consider first the L-radius. As is well-known, ||X||(i) < ||X|| p for all 
p > 1. Equality is obtained for X = e 12 . 

For the C-radius, we have, by Corollary [2] with p = 2, ^= ||XX*||J/ 2 < 
||X|| p /||P|| p = 2- 1 /p||x|| p for all p > 2, so that r c (X) < 2- 1 /f||X|| p for 
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all p > 2. In addition, since ||X|| P > ||X|| 2 for 1 < p < 2, we also have 
r c (X) < 2- 1 /2||x|| p for 1 < p < 2. 

Equality for 1 < p < 2 is obtained for X = e 12 , and for p > 2 for X = F. □ 
It would have been nice if the following had been true: 

rc(X) < |||X| c || {2) /2, (34) 

since in combination with theorem [6] this would have given 

r c (X) < \\\\X\c 
and, in particular, for Schatten p-norms 



r c (X) < 2- 1/p \\ \X 



C\\p- 



In fact, for d > 2 none of these inequalities are true. If they had been, the 
Bhatia-Kittaneh inequalities (|2~4"|) would have given an alternative proof of 
Corollary 0] The fact that numerical tests showed (13"1"|) to hold for d = 2 
provided the inspiration for the proof of theorem [T21 



7.4 Application to commutator bounds 

We finish by giving the promised sharp bound on the Frobenius norm of a 
commutator: 

Corollary 5 For general complex matrices X and Y , and p > 1, 
|| [X,Y] || 2 < >/2||X|| 2 ||Y|| (2))2 < 2 max ( 1 / 2 ' 1 - 1 ^||X|| 2 ||r|| p . 



Proof. In the proof of theorem [T] we already found that 

\\XY -YX\\l < A\\X\\ 2 2 (Ti[p(Y*Y + YY*) /2] - |Tr[pF]| 2 



The second factor is what we coined the Cartesian variance of Y, Varc iP (Y), 
and is thus bounded above by rc{Y) 2 . By theorem [121 and corollary HJ we find 



\\XY-YX\\ 2 < v^||X|| 2 ||Y| 
and the other stated inequalities. □ 



(2),2 
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